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Abstract: In this paper, we make contact with the field of compressive 
sensing and present a development and generalization of tools and results 
for reconstructing irregularly sampled tomographic data. In particular, we 
focus on denoising Spectral-Domain Optical Coherence Tomography 
(SDOCT) volumetric data. We take advantage of customized scanning 
patterns, in which, a selected number of B -scans are imaged at higher 
signal-to-noise ratio (SNR). We learn a sparse representation dictionary for 
each of these high- SNR images, and utilize such dictionaries to denoise the 
low-SNR B -scans. We name this method multiscale sparsity based 
tomographic denoising (MSBTD). We show the qualitative and quantitative 
superiority of the MSBTD algorithm compared to popular denoising 
algorithms on images from normal and age-related macular degeneration 
eyes of a multi-center clinical trial. We have made the corresponding data 
set and software freely available online. 
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1. Introduction 

In this paper, we introduce an image restoration method that attempts to recover the noiseless 
high-frequency information corrupted by the limitations of the state-of-the-art medical 
tomographic imaging systems. In particular, we focus on the spectral domain optical 
coherence tomography (SDOCT) systems [1,2], and show the applicability of our algorithm in 
reducing speckle noise [3,4]. 

Previous works in removing speckle noise can be categorized into two groups: model- 
based single-frame or multi-frame averaging techniques, as shown in Fig. 1. The first group 
utilizes classic denoising methods, which often assume an a priori parametric or non- 
parametric model for the signal and noise (e.g. Wiener filtering [5], kernel regression [6], or 
wavelets [7]). Theoretical bounds on the performance of such approaches are described in [8]. 
A few such algorithms already have been utilized for denoising ocular SDOCT images 
including [9-12] (Fig. 1(a)). The second group relies on capturing a sequence of repeated B- 
scans from a unique position. In a post-processing step, these images are registered and 
averaged, to create a less noisy image [13] (Fig. 1(b)). Some SDOCT imaging systems such as 
Spectralis (Heidelberg Engineering, Heidelberg, Germany) have a built-in image stabilization 
and averaging system and can directly produce high-SNR images. For others, such as 
Bioptigen SDOCT system (Bioptigen Inc., Research Triangle Park, NC), registration and 
averaging is done post image capturing phase. Indeed, for either of these SDOCT systems, 
averaging based denoising dramatically increases the image acquisition time. To reduce image 
acquisition time, a very recent work [14] utilizes wavelets for denoising a sequence of 
repeated SDOCT B -scans captured from a unique position. 
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Multiple frames captured from a unique position 





Fig. 1 . Two classic methods for removing noise in SDOCT image, (a) Denoising by a model- 
based single-frame technique [7]. (b) Denoising by multi-frame averaging technique: SDOCT 
frames acquired from the same location are averaged to increase SNR. 

In this work, we present a novel hybrid approach which is called multiscale sparsity based 
tomographic denoising (MSBTD) for denoising volumetric SDOCT scans, where the imaged 
volume is sampled at several azimuthally distanced B -scans (Fig. 2(a)). The MSBTD method 
utilizes a non-uniform scanning pattern, in which, a fraction of B -scans are captured slowly at 
a relatively higher than nominal signal-to-noise ratio (SNR). The rest of the B -scans are 
captured fast at the nominal SNR. Utilizing compressive sensing principles [15-21], we learn 
a sparse representation dictionary for each of these high-SNR images and utilize these 
dictionaries to denoise the neighboring low- SNR B- scans. The rationale for this approach is 
that neighboring B -scans, in common SDOCT volumes, are expected to have similar texture 
and noise pattern, as illustrated in Fig. 2. The exciting property of our approach is that, unlike 
[14], it does not require capturing more than one B-scan from the majority of azimuthal 
positions and therefore, it requires significantly less scanning time. Moreover, the MSBTD 
method is well suited to retrospectively improve the quality of images in databanks from large 
scale multi-center clinical trials [22]. Note that, traditionally, ocular SDOCT imaging 
protocols often require at least two types of SDOCT scans: the first, a densely sampled (but 
low-SNR) large field-of-view volumetric scan and the second, sparsely sampled high-SNR 
scans (often consisting of only one image from the fovea). In section 3, we show the 
applicability of the MSBTD method for such data sets. 













(b) 












(c) 



Fig. 2. In common scanning patterns, SDOCT B-scans acquired from adjacent (azimuthally 
distanced) positions (a) have similar features (b,c). (a) In the summed- voxel projection [28,29] 
(S VP) en face SDOCT image of a non-neovascular age-related macular degeneration (AMD) 
patient, (b,c) B-Scans are acquired from the location of the blue and yellow lines, respectively. 
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A few previous works have studied related algorithms in the context of photonics imaging. 
The first hardware implemented experiment demonstrating the superiority of adaptive 
sampling in improving the scanning speed and SNR of ballistic photon based imagers was 
reported in [23]. More recently, application of compressive sensing theory for speeding up the 
SDOCT scanning time has attracted sizable attention [24-27]. While these important papers 
have described different implementations of compressive sampling in SDOCT imaging, they 
provide limited performance analysis, as their experimental results have been only compared 
to basic none-data-adaptive techniques such as cubic interpolation. A fundamental question, 

1. e. "Is sparsity based reconstruction of SDOCT images beneficial as compared to alternative 
modern adaptive restoration/reconstruction techniques?" has yet to be answered. 

To address this question, in Section 2, we introduce the MSBTD algorithm based on 
irregular sampling and sparse restoration, and demonstrate its superiority compared to popular 
modern adaptive denoising algorithms. While the sparse representation objective function for 
the MSBTD method [30,31] (as with virtually all other works in this field) has been known to 
the image processing community, the main novelty of our algorithm is its integration with a 
customized scanning algorithm, which adapts it for clinical SDOCT applications. As noted, an 
interesting aspect of our paper is its universal application. To prove this point, in Section 3, 
unlike many other previous works in this field, we abstain from experimenting on simulated 
or controlled data sets captured in the laboratory environment, which might be biased or 
unfairly optimized in favor of the proposed algorithm. Instead, we test the MSBTD algorithm 
using an unbiased randomly selected deidentified data set from a multi-center clinical trial, the 
imaging protocol of which was designed years ago, independent of the algorithm in this paper. 
Concluding remarks are given in Section 4, discussing future possible applications of the 
proposed technique. 

2. Methods 

In our previous works, we have demonstrated the application of multi-frame image fusion 
methods to overcome the theoretical and practical constraints (e.g. resolution [32] and field- 
of-view [33]) of modern optical imaging systems. As briefly reviewed in the previous section, 
the proposed MSBTD algorithm is based on a customized scanning pattern, in which, a 
minority number of B -scans are captured at higher SNR compared to others. We use these 
high-SNR images to learn a sparse model of underlying anatomical structure in this sequence 
of B-Scans and apply this model to denoise the low-SNR images. 

2.7. Sparse representation of SDOCT images 

A volumetric SDOCT scan, Y e !£^ xMxL 9 consists of L images (B -scans), each representing a 

2D slice, Yj e R NxM , of a 3-D structure. In many clinical applications, captured B-scans are 

very noisy and require image enhancement (denoising) before any visual or quantitative 
analysis. 

In recent years, many compressive sensing/sparse representation based algorithms have 
been proposed as promising alternative approaches to classic denoising techniques [31,34- 
37]. Sparse representation [38] models noiseless signal as a sparse linear combination of basis 
elements, termed atoms, from a dictionary. 

We represent a small rectangular patch (of size N'xM' ) in the desired noiseless image as 
X . A vector representation of this patch can be obtained by lexicographic ordering: 
x g M e ;<2 = N'xM' . This vector can be synthesized by a linear combination of basis 
functions assembled in a matrix called dictionary, D e R QxP , as 

x = Da, (1) 
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where P>Q implies that the dictionary is redundant and aeM p is a vector of scalar 
coefficients. The basic idea of compressive sensing is that when for a class of images the 
majority of elements in vector a corresponding to a particular dictionary are zeros or very 
small (sparsity), relatively few well-chosen measurements suffice to reconstruct the images in 
this class [21]. 

We denote the patch contaminated with noise as x Noise e M e . Since, unlike the SDOCT 
image, noise is not expected to be sparsely represented by D , the sparse solution of the 
denoising problem [34] can be obtained from 

& = argminllall subject to ||x^ se -Da|| < co, (2) 

a 11 110 N 112 

where and ||«[| represent the £ 2 and £ 0 -norms, respectively, and co is the error tolerance. 

A fundamental consideration in employing the above model is the selection of the 
dictionary D . One popular class of sparsity based denoising algorithms exploits the 
information of the noisy image itself to define the dictionary (which we denote as D Noise ) [34] . 
While for many situations this method provides very promising results, however, the high- 
level of noise in the SDOCT images negatively interferes with the learning process, degrades 
the quality of the trained dictionary, and subsequently results in a suboptimal image 
restoration. An alternative (ideal) approach is to learn the dictionary from the noiseless image. 
Since in practice, such an ideal image is not available, we rely on a less noisy image Y Ave , 
obtained from registering and averaging a sequence of (e.g. T) repeated B -scans 
Y/,...,Y/,...,Y Z T from a unique position (Fig. 1). It is natural to assume that the dictionary 
trained on this averaged image, D Ave , is less affected by noise as compared to \y Nois \ 



The original SDOCT image with high noise The averaged SDOCT image with low noise 




Fig. 3. Dictionaries trained on (a) the original low-SNR SDOCT image by the K-SVD 
algorithm, (b) averaged high-SNR image by the K-SVD algorithm, and (c) averaged high-SNR 
image by the proposed MSBTD training algorithm (due to the limited space, we show only the 
first atom of each learned subdictionary). 

To directly compare these two methods, we use the popular K-SVD training algorithm 
[39] and the proposed algorithm described in the following subsection for the dictionary 
learning process. Figure 3(a) shows an illustrative example of a dictionary trained using the 
K-SVD algorithm on a low-SNR B-scan. Figure 3(b,c) show corresponding examples of 
dictionaries trained using the K-SVD and the MSBTD algorithms on an averaged high-SNR 
image, respectively. Visual inspection of the dictionary D Noise suggests that it is more affected 
by noise as compared to D Ave . Ergo, in contrast to [34], we propose to denoise each low-SNR 
image utilizing a dictionary learned from a nearby (or even comparatively distant) averaged 
(high-SNR) image. This strategy allows us to reduce the noise disturbance in the dictionary 
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learning process, which we expect to enhance the denoising outcome, without significantly 
increasing the image acquisition time. In the experiment section, we quantitatively compare 
the performance of such dictionaries in denoising SDOCT images. 

2.2. Multiscale structural dictionary 

As noted, in compressive sensing, design of the dictionary is a critical and often complex 
issue. Applying K-SVD on the whole image is computationally very expensive. An alternative 
for reducing computational complexity is to cut the training image into overlapping fixed-size 
patches, and learn a universal dictionary D on these patches by the K-SVD algorithm [34] or 
its variants [35,40]. Since such a universal dictionary is neither optimal nor efficient to 
represent different structures, we alternatively learn a set of subdictionaries 

{Df r elR exe },/; = l,2,...,K, each best fit a particular structure [41,42]. This is achieved by 
first clustering the training patches into K structural clusters using the k-means approach. We 
will use the centroid of each cluster ( e R Q ) in a later dictionary selection step (Section 

2.3.1). Next, a subdictionary is deduced from each of the K clusters using the principle 
component analysis (PCA) algorithm. 

To effectively capture the properties of different structures and textures on ocular SDOCT 
images (e.g. noting that each retinal layer has a different thickness) with the learned 
dictionary, it is natural to prefer variable sized training patches. This is analogous to the 
adaptive kernel shape and size discussion in the classic image denoising literature (smaller 
kernels are preferred in the texture area, while larger kernels are used in the smooth area) [43]. 
Both approaches [41,42] noted in the above paragraph use fixed size patches, since neither of 
k-means and PCA algorithms (in their primal forms) can deal with the data of varying size. To 
address this problem, the idea of learning multiscale dictionary has been introduced in two 
recent works [37,44]. These methods adopt the K-SVD algorithm to learn subdictionaries on 
each image domain obtained from the wavelet or quadtree decomposition. However, the 
wavelet or quadtree decomposition can only be regarded as a way to downsample the training 
image (zooming out). One may argue that such approaches might not efficiently represent 
detailed features, which are smaller than the base (zeroth- scale) patch (zooming in). 




Fig. 4. Algorithmic flowchart of multiscale structural dictionary learning process. 

To resolve this problem, we incorporate a basic multiscale approach into the structural 
learning process. Specifically, instead of varying the patch size, the training image itself is 
first zoomed in and out through upsampling and downsampling processes. Then, the original 
and these magnified images are divided into same sized patches. Ergo, although the patch size 
is fixed, patches from different magnified scales essentially can be considered as variable 
sized patches from a particular scale. Next, the structural learning process is applied on the 
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patches from the same scale ( s ) to create the multiscale structural dictionary, which is the 

concatenation of the subdictionaries ({D^f},s = l,2,...,S)from all scales. Fi sure 4 shows a 

schematic representation of the multiscale structural dictionary learning process. In this paper, 
we implemented the upsampling and downsampling processes by the bilinear interpolation. 
Indeed, to improve performance, one may use more advanced image resizing algorithms, 
which may result in improved performance, while increasing computational complexity. 

2.3. Nonlocal denoising process 

The previous subsection described a method to learn a multiscale structural dictionary from a 
high-SNR image. This subsection describes how to utilize this learned dictionary for 
denoising low-SNR SDOCT images. As in the dictionary learning step, we divide the low- 
SNR image Y, into overlapping patches y 1 ,...,y.,....,y I of size wx^, where the distance 
between each patch is V pixels in each direction (i.e. for an image of size N x M, the number 
of the extracted patches (7) is (N - w)x(M -z)/(V xV) ). For each patch (y.), we find an 

appropriate subdictionary Df from the learned multiscale structural dictionary to sparsely 

represent the patch, which implicitly results in denoising of that patch. These steps are 
detailed in the following. 

2.3.1. Subdictionary selection 

To find the best subdictionary among the collection of learned subdictionaries for each noisy 
patch, we compare representative features of the patch (y . ) and each subdictionary. To 

represent each subdictionary, we use the centroid atom (c ks g R (m;,z)x1 ) of the corresponding k- 
means cluster (Section 2.2) [42]. As for the representative feature of each patch, we utilize its 
high-frequency component (denoted by yf ) [42]. We find the best fitted subdictionary Df 

for the patch y . based on the normalized correlation [35,45] between c k s and yf : 

A = (fc. , 5. ) = arg max \U k s , yf )| . (3) 

k,s I \ 'I 

2.3.2. Sparse representation of patches 

After finding the appropriate subdictionary, following [31], we define the objective function 
for the sparse representation of the noisy patch as follows: 




where X l and X 2 are scalar Lagrange multipliers and a . is a sparse vector of coefficients for 
the i th denoised patch. The sparse vector p. is the expected value of a sparse vector 
representing the corresponding noiseless patch. While the first regularization term allows us to 
exploit local sparsity in a patch, the second regularization term (through p. ) allows us to 

utilize the non-local similarities (data redundancy) often seen in natural images [31,35,46,47]. 
In practice, the noiseless patch is approximated as the average of / patches with the highest 
similarity to the processed patch y. within a searching window [31]. 

The double-head optimization problem (4) is solved by the iterative reweighted algorithm 
in [31]. In each iteration, one of the unknown parameters is kept constant and the other can be 
efficiently found by greedy pursuit [45,48] or iterative shrinkage algorithms [49,50], which 
replace the £ 0 -norm with the £ l -norm. After finding the sparse representation of all (I) 
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patches (a 1 ,...,a.,....,a I ), the estimate of the denoised patches can be easily obtained from 
Dfa 1 ,...,Dfa.,....,Dfd I . Then, we reformat the estimated patches from lexicographic vectors 

to rectangular shape and return them to their original geometric position. Since these patches 
are highly overlapped, we average their overlapping sections to reconstruct the denoised 
image as described in [34]. The outline of the whole denoising process is shown in Fig. 5. 




Fig. 5. The outline of the MSBTD algorithm. 



2.4. Algorithmic parameters 

For the specific case of denoising retinal SDOCT images, we have empirically selected 
algorithmic parameters, which are kept unchanged for all experiments in this paper. Since 
most of the similar structures in the SDOCT image lie on the horizontal direction, the patch 
and the search window sizes are chosen to be rectangle of size 3 x 20 and 40 x 60 pixels, 
respectively. The distance V between each processed patch is 1 and the number / of similar 
patches in each searching window is 20. In the k-means clustering stage, the cluster number K 
is set to 70. Prior to the multiscale learning process, the original image is upsampled two 
times, each by a factor 1.25 and downsampled three times, each by a factor of 1.5625 to create 
the training images of six scales. The parameters of the iterative re weighted algorithm for 
solving the problem (4) are set to the default values in [31]. 

2.5. Data sets 

We tested our algorithm on randomly selected data sets from 17 eyes from 17 subjects with 
and without nonneovascular age-related macular degeneration (AMD) enrolled in the A2A 
SDOCT study, which was registered at clinicaltrials.gov [22] and approved by the institutional 
review boards (IRBs) of the four A2A SDOCT clinics (Devers Eye Institute, Duke Eye 
Center, Emory Eye Center, and the National Eye Institute). In particular, ten data sets are from 
normal subject, while the rest are from AMD subjects. The study complied with the 
Declaration of Helsinki, and informed consent was obtained from all participants. 

All data sets were captured before the start of this project and the imaging protocol was 
not altered in any form for this study. In the A2A SDOCT study, volumetric scans were 
acquired using SDOCT imaging systems from Bioptigen, Inc. (Research Triangle Park, NC). 
For each patient across all sites, two types of SDOCT scans were captured. 1) A square (-6.6 
x 6.6mm) volume scan with 1000 A-scans and 100 B-scans, which included the fovea. The 
scan sizes and the axial, lateral, and azimuthal resolutions varied slightly by site, and are 
specified in Table 1 of [51]. 2) An azimuthally repeated scan with 1000 A-Scans and 40 B- 
Scans targeted at the fovea. We cropped each A-scan, to achieve B -Scans of size 280 x 1000, 
to only include the informative areas (excluding the smooth dark areas deep below the choroid 
or in the vitreous). 
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2.6. Data processing 



We registered the azimuthally repeated scan using the Image J (software; National Institutes of 
Health, Bethesda, Maryland, USA) StackReg Registration plug-in [52]. We implemented all 
other image processing tasks in MATLAB (MathWorks, Natick, MA). To implement the 
iterative reweighted algorithm, we used the code provided in [53]. In addition, we employed 
the MATLAB code in [54] to estimate the noise variance in the test images. 

2.7. Quantitative measures of performance 

We used mean-to-standard-deviation ratio (MSR) [55], contrast-to-noise ratio (CNR) [56], 
and peak signal-to-noise-ratio (PSNR) to compare the performance of different denoising 
algorithms. The metrics MSR and CNR are defined as follows: 

MSR = ^, (5) 
I u f -u h \ 

CNR = _ _ y= 2 , (6) 
^.5{a 2 f +a 2 b ) 

where ju b and <J b are the mean and the standard deviation of the background region (e.g. red 

box #1 in Fig. 6), while ju f and <j f are the mean and the standard deviation of the foreground 

regions (e.g. red box #2-6 in Fig. 6). 

As a global model of image quality, we used the peak signal-to-noise-ratio (PSNR) [57], 
defined as 



PSNR = 20 -log 




where is the h th pixel in the reference noiseless image R , R^ represents the h th pixel of the 
denoised image R , H is the total number of pixels, and Max R is the maximum intensity 
value of R . 

Since there is no ideal "noiseless" image available for these clinical experiments, in 
Experiment 1, we used the averaged foveal image as a noiseless approximation to the 
corresponding foveal B-scan from the noisy volumetric scan. We identified the low-SNR 
foveal B-scan corresponding to the averaged scan by visual comparison. To have the best 
matching, the averaged and noisy images were registered using the StackReg Registration 
plug-in for ImageJ [52]. We note that in practice, such visual comparison is not necessary and 
the performance of our algorithm is independent of information about the location of the 
averaged high-SNR B-scan. We use the averaged B-scan location only to generate 
quantitative performance metrics, i.e. PSNR. In Experiment 2, we tested on more distant B- 
scans for which averaged (noiseless) image were not available. In these cases, we only 
reported MSR and CNR values. 

3. Experimental results 

To evaluate the effectiveness of the proposed MSBTD method, we compared its performance 
with those from four well-known denoising approaches: Tikhonov [58], New SURE [59], K- 
SVD [34], and BM3D [47]. In these experiments, the parameters for the Tikhonov [58] 
method are tuned to reach the best results, while the parameters of the New SURE, K-SVD, 
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(m) (n) 



Fig. 6. Denoising results for two test SDOCT retinal images using the Tikhonov [58], New 
SURE [59], K-SVD [34], BM3D [47], and the proposed MSBTD method. The left and right 
columns show the foveal images from a normal subject and an AMD patient, respectively, (a, 
b) The averaged (high-SNR) images for the multiscale structural dictionary training, (c, d) 
Low-SNR noisy foveal images from the volumetric scans, (e, f) Denoising results using the 
Tikhonov method, (g, h) Denoising results using the New SURE method, (i, j) Denoising 
results using the K-SVD method, (k, 1) Denoising results using the BM3D method, (m, n) 
Denoising results using the MSBTD method. 

and BM3D methods are set to the default values as in their corresponding references 
[34,47,59]. 

3.1. Experiment 1: denoising based on learned dictionary from a nearby high-SNR Scan 

Figure 6. shows qualitative comparisons of two raw SDOCT retinal images (from a normal 
and an AMD subject) and their denoised versions obtained from the noted denoising methods. 
The raw (low-SNR) images are the geometrically closest (less than 66um distanced) B -scans 
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to the averaged (high-SNR) images. Since the boundaries between retinal layers contain 
important anatomic and pathologic information [60], three boundary regions (boxes #2, 3, 4) 
in these images are marked with red rectangle and magnified. As can be observed, the 
Tikhonov [58] and New SURE [59] methods provide limited noise suppression for the test 
images. Although the K-SVD method better suppresses the noise, it introduces over- 
smoothing, thus leading to significant loss of image details. The BM3D method delivers 
improved noise suppression and limits the over- smoothing problem to some extent, but 
creates obvious artifacts. By contrast, application of the proposed MSBTD method results in 
noticeably improved noise suppression, while preserving details as compared to other 
methods. Especially, note in the layer boundary preservation in the area magnified by the red 
boxes (e.g. #2 and #4). 

To compare these methods in a quantitative manner, we measured MSR and CNR on six 
regions of interest (similar to the red boxes #1-6 in Fig. 6) from 17 test images of different 
(randomly selected) subjects. In each image, we averaged the MSR and CNR values obtained 
for the five foreground regions (e.g. red box #2-6 in Fig. 6). We report the mean and standard 
deviation of these averaged MSR and CNR results across all the test images in Table 1 . 



Table 1. Mean and standard deviation of the MSR and CNR results for seventeen SDOCT 
retinal images using the Tikhonov [58], New SURE [59], K-SVD [34], BM3D [47], and the 
proposed MSBTD method. The best results in the table are shown in bold. 





Original 


Tikhonov 
[58] 


New SURE 
[59] 


K-SVD 
[34] 


BM3D 
[47] 


MSBTD 


Mean (CNR) 


1.27 


3.13 


2.49 


4.11 


3.89 


4.76 


Standard deviation 
(CNR) 


0.43 


0.94 


0.60 


1.23 


1.05 


1.54 


Mean (MSR) 


3.20 


7.62 


6.74 


11.22 


11.52 


14.76 


Standard deviation 
(MSR) 


0.46 


0.95 


1.69 


2.77 


2.42 


4.75 



Next, we compared the PSNR of these methods for all the seventeen subjects. The mean 
and standard deviation of the PSNR results are reported in Table 2. As for the case of MSR 
and CNR shown in Table 1, we observe that the MSBTD method delivers the best results in 
the PSNR sense. Furthermore, we note that although the PSNR of the K-SVD is close to that 
of the MSBTD method, our clinical collaborators preferred the visual outcome of the MSBTD 
method, as illustrated in Fig. 7. 



Table 2. Mean and standard deviation of the PSNR (dB) for seventeen SDOCT foveal 
images obtained from the Tikhonov [58], New SURE [59], K-SVD [34], BM3D [47], and 
the proposed MSBTD method. The best result in the table is shown in bold. 





Tikhonov 
[58] 


New SURE 
[59] 


K-SVD 
[34] 


BM3D 
[47] 


MSBTD 


Mean (PSNR) 


23.67 


23.46 


26.13 


26.04 


26.46 | 


Standard deviation 
(PSNR) 


0.96 


1.40 


1.70 


1.65 


1.72 



3.2. Experiment 2: denoising based on learned dictionary from a distant high-SNR scan 

In the previous experiments, to denoise the test images, we trained the multiscale structural 
dictionary on a geometrically close (adjacent) averaged image. To test whether the learned 
dictionary is effective for denoising any other macular scan (i.e. distanced up to 3.3mm from 
the fovea), we apply a single learned dictionary to images captured from four different 
locations and compare the results of the MSBTD with those of the Tikhonov [58], New SURE 
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[59], K-SVD [34], and BM3D [47]. The quantitative results are reported in Table 3. Similar to 
Experiment 1, six regions (similar to the red boxes #1-6 in Fig. 8) in each image are selected 
to compute the CNR and MSR metrics (for this case, we could not report PSNR since no valid 
denoised image is available). We observe that compared to the other methods, the MSBTD 
method still achieves the best performance in terms of the quantitative measures (CNR and 
MSR). Furthermore, we provide the qualitative comparison of the MSBTD method with the 
K-SVD and BM3D in Fig. 8. For easy comparison, three boundary regions (boxes #2, 3, 4) in 
each image are magnified as well. As can be observed, the visual appearance of the MSBTD 
is superior to that of the K-SVD and BM3D (and significantly better than the weaker methods 
Tikhonov [58], New SURE [59], results of which omitted to save space). 

Table 3. Mean and standard deviation of the MSR and CNR measures for four test 
images acquired from different positions using the Tikhonov [58], New SURE [59], K- 
SVD [34], BM3D [47], and the MSBTD method. The best results in the table are shown in 

bold. 





Original 


Tikhonov 
[58] 


New SURE 
[59] 


K-SVD 
[34] 


BM3D 
[47] 


MSBTD 


Mean (CNR) 


1.18 


3.26 


2.81 


5.00 


4.93 


6.04 


Standard deviation 
(CNR) 


0.17 


0.22 


0.25 


0.04 


0.39 


0.71 


Mean (MSR) 


3.22 


7.64 


7.58 


12.73 


12.96 


[ 16.32 


Standard deviation 
(MSR) 


0.08 


0.63 


0.48 


1.86 


2.52 


3.95 ' 



On average, denoising one B-scan required about 9 minutes, implemented in MATLAB 
code executed on a desktop computer with an Intel (R) Core i7-2600K CPU and 8 GB of 
RAM. Our code was not optimized for speed. The processing time is expected to be reduced 
significantly by more efficient coding coupled with a general purpose Graphics Processing 
Unit (GPU) [26]. 

4. Discussion and Conclusions 

In this paper, we demonstrated a novel approach called the MSBTD for denoising SDOCT 
images using the sparse representation technique. We proposed to capture SDOCT volumes 
with a non-uniform scanning pattern, where a selected number of B -scans were imaged at 
higher SNR. We learned a sparse representation dictionary for each of these high-SNR 
images, and utilized them to denoise the low-SNR B-scans. We demonstrated that the 
MSBTD method outperforms state-of-the-art denoising methods. To encourage further 
research in this area, we have made the data set and the software that we have developed for 
this project publically available at http://www.duke.edu/-sf59/Fang BOE 2Q12.htm . 

In the experimental section, we selected the parameters for the patch and searching 
window empirically, and kept them fixed. Note that, the optimal choices for such parameters 
should depend on the specific application. For example, as the search window size increases, 
the denoising performance of the MSBTD method is expected to slightly improve. However, 
this improvement in performance comes with the increase in the computational complexity. 
On the other hand, utilizing larger patches suppresses noise more efficiently but results in a 
more aggressive smoothing and the loss of image details. 

An interesting aspect of the MSBTD algorithm is that we only needed to capture very few 
(in our experiments only one) high SNR image for denoising the full volume. We believe the 
reason for this phenomenon is that while retinal SDOCT images from different locations in 
the eye may appear very different, they are all built with similar building blocks (multiscale 
structural dictionary representing retinal layers with different thicknesses and orientations). 
Moreover, we were lucky that our high-SNR images are commonly captured from the fovea, 
in which, retinal layers have a different thickness and orientating in the central region as 
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Fig. 7. Visual comparison of denoising results from the Tikhonov [58], New SURE [59], K- 
SVD [34], BM3D [47], and MSBTD on two SDOCT test images, (a, b) Test SDOCT images, 
(c, d) Averaged images, (e, f) Denoising results using the Tikhonov method (Left: PSNR = 
22.67, Right: PSNR = 24.01). (g, h) Denoising results using the New SURE method (Left: 
PSNR = 25.39, Right: PSNR = 25.35). (i, j) Denoising results using the K-SVD method (Left: 
PSNR = 27.98, Right: PSNR = 25.81). (k, 1) Denoising results using the BM3D method (Left: 
PSNR = 27.72, Right: PSNR = 25.69). (m, n) Denoising results using the MSBTD method 
(Left: PSNR = 28.19, Right: PSNR = 26.06). 
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Fig. 8. Denoising results for test images captured from four different locations processed by the 
same learned dictionary, (a) Summed-voxel projection (SVP) en face image of an AMD eye 
(~6.6mm 2 ). (b, f, j, n) Raw noisy images acquired from the location b-e shown in (a), (c, g, k, 
o) Denoising results using the K-SVD [34]. (d, h, 1, p) Denoising results using the BM3D [47]. 
(e, i, m, q) Denoising results using the MSBTD. 
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compared to the periphery. Such diversity in the structure of foveal B-scan results in more 
efficient and generalizable learned sub -dictionaries. This is important for two reasons. First, it 
means that the total scanning time for capturing a high-SNR volume can be reduced by 
utilizing the MSBTD method. Second, we can retrospectively denoise SDOCT scans from 
many other clinical trials, which use at least two very common scanning protocols, i.e. the 
dense volumetric low-SNR scan, and the so-called limited linear (averaged) high-SNR scan. 

We note that the foveal (averaged) image sequence was captured in less than two seconds. 
This short scanning time reduces the probability of large motion artifacts. In the rare case that 
some of these frames are corrupted by artifacts like blinking, such frames can be easily 
detected by a simple normalized cross-correlation test [57] and removed from the averaging 
set. 

Another interesting case is when the volumetric scan pattern is very dense in the azimuthal 
direction and the neighboring B -scans might have similar structures. Such an approximation is 
more valid in the normal eyes (as compared to the AMD eyes). In such cases, if there is no 
access to the averaged frames, we can approximate the training high-SNR image by averaging 
a few neighboring B -scans, which are away from the fast changing foveal region. 

In this paper, we demonstrated the applicability of our algorithm for denoising images 
from the normal and nonneo vascular AMD subjects. In our future publications, we will 
address the applicability of our denoising method for efficient image analysis in other types of 
retinal diseases including the diabetic macular edema. In addition, it is worthy to note that 
while we used this algorithm for denoising purposes, it is straightforward to use it for many 
other image restoration/enhancement applications such as deblurring, interpolation, and super- 
resolution. Moreover, the proposed MSBTD method is not necessarily the optimal 
compressive sensing approach for denoising SDOCT images. Our method was based on many 
on-the-shelf algorithms originally designed for other models of signal and noise. A more 
efficient adaptation of the compressive sensing technique to the ocular SDOCT signal and 
noise properties and its utilization for a variety of inverse problem applications are parts of 
our ongoing work. 

Acknowledgments 

This work was supported in part by a grant from the American Health Assistance Foundation, grants 
from the NIH 1R21EY021321-01A1, by Research to Prevent Blindness 2011 Duke's 
Unrestricted Grant award, by a grant from the National Natural Science Foundation of China 
(61172161), by the Scholarship Award for Excellent Doctoral Student granted by the Chinese 
Ministry of Education, and by the Fundamental Research Funds for the Central Universities, 
Hunan University. We thank the A2A Ancillary SDOCT Study sites for sharing this deidentified data. 



#164161 - $15.00 USD Received 6 Mar 2012; revised 6 Apr 2012; accepted 10 Apr 2012; published 12 Apr 2012 
(C) 2012 OSA 1 May 2012 / Vol. 3, No. 5 / BIOMEDICAL OPTICS EXPRESS 942 



